Skip to content

Conversation

@bigio
Copy link
Contributor

@bigio bigio commented Oct 6, 2025

"SÉCURITÉ" makes this rule hit

@fkoyer
Copy link

fkoyer commented Oct 6, 2025

Looks good but you should probably change s\xc3\x89curit\xc3 to s\xc3\x89curit\xc3\x89 so the final Unicode char is complete.

Might I also suggest combining the upper and lower case versions into one pattern so the whole thing looks like this:

/(?=<S>)(?!security)(?!seguridad)(?!s\xc3[\xa9\x89]curit\xc3[\xa9\x89])<S><E>(?:<C>|<G>)<U><R><I>(?:<T><Y>|<D><A><D>)/i

@fkoyer
Copy link

fkoyer commented Oct 8, 2025

I decided to look into why <Y> was matching "É" in the first place. I found a lot of problems with the current replace_tag definitions. I created PR #19 which fixes this bug and many others. Feedback appreciated.

@jhardin-impsec
Copy link

Fixed in SVN, let's see if that closes this PR...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants